GiHub Repo

configs

data access

Initial Exploratations

1. distribution of logistics_dropoff_distance

- After converting to logistic form It might shape as Left Skewed Normal Distribution

2. mean & median & max & min of logistics_dropoff_distance

3. order count distribution per customers

3. I would like to see that if the total number of order count per hour increases, how logistics_dropoff_distance changes.

4. ratio of 1st order customers

5. delivery_geohash_precision8 decoding to lat - lon

6. Unique Customer Count at Location (GeoHash Decoded)

7. DropOff Distance at Location (GeoHash Decoded) ( < 1000)

8. Order Count per Customer at Location (GeoHash Decoded)

9. geohash base average dropoff_distance with precision 7

10. number of delivery_postal_code == number of delivery_geohash_precision8 ?

11. number of customers who have only 1 order

12. number of geohash locations

13. Customer Location Of Unique Customer Count (geoHash precision=8)

14. dropoff_distance Breakdown With Hour & Week Of Day

- created_timestamp_local is local timestamp, so, we can pretent like rush hour range 17 - 23 and 12 - 14

15. binary features

16. order value distributions

 17. order_value - logistics_dropoff_distance correlation

 18. order_items_count - logistics_dropoff_distance Distribution

19. delivery_postal_code - logistics_dropoff_distance Frequency Distribution

20. customer order sequence

data cleanup

Feature Engineering

1. GeoHash Decoding Latitude - Longitude

2. geohash base average dropoff_distance with precision 8

3. geohash base average dropoff_distance with precision 7

4. Binary Features

5. order count per customer

6. Average logistics_dropoff_distance per hour

 7. Average logistics_dropoff_distance per day part

8. Average logistics_dropoff_distance per order_items_count

9. Average logistics_dropoff_distance per delivery_postal_code

10. logistics_dropoff_distance Normalization

11. order value normaliztion

Correlation of Features

Feature Reduction

2. explained variance calculation

3. feature reduction rule

Train Process

1 . train - test split (train 75 %, test 25 %)

2 . parameter tuning

3. train NN model

4. model performance (Train - Validation Set Loss per Epoch)

5. Let`s sample from data set predict and check Residuals of the Distribution

6. Residuals Distribution

7. sample prediction results export

Model Improvements

Additional Features for prediciton model

 Propose some front-end features that might help reduce dropoff_distance.